blk-iolatency: flush enable work after policy deactivation#980
Open
blktests-ci[bot] wants to merge 1 commit into
Open
blk-iolatency: flush enable work after policy deactivation#980blktests-ci[bot] wants to merge 1 commit into
blktests-ci[bot] wants to merge 1 commit into
Conversation
Author
|
Upstream branch: 66affa3 |
e6d9eb8 to
7d8604f
Compare
A blk-iolatency rq-qos teardown can free struct blk_iolatency while a freshly queued enable_work callback still references it. The observed failure is: blkcg_iolatency_exit() flushes enable_work before deactivating the iolatency policy. However, blkcg_deactivate_policy() calls iolatency_pd_offline() for online policy data, and iolatency_pd_offline() clears min_lat_nsec through iolatency_set_min_lat_nsec(). If this clears the last nonzero latency target, enable_cnt reaches zero and schedules enable_work again after the flush has already returned. The buggy scenario involves two paths, with each column showing the order within that path: blkcg_iolatency_exit() path: system_wq worker path: 1. Flush old enable_work. 1. enable_work is idle. 2. Deactivate the policy. 2. no worker owns it. 3. Offline queues new enable_work. 3. work item becomes pending. 4. Free blkiolat. 4. worker later runs the item. 5. Owner storage is gone. 5. worker dereferences blkiolat. Flush enable_work again after blkcg_deactivate_policy() returns and before freeing blkiolat. Policy offline callbacks have completed at that point, so the second drain covers the late queueing path without changing the normal enable/disable accounting rules. Validation reproduced this kernel report: BUG: KASAN: slab-use-after-free in assign_work+0x2a/0x150 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_report+0xd0/0x630 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? __virt_addr_valid+0xea/0x1a0 ? assign_work+0x2a/0x150 kasan_report+0xce/0x100 ? assign_work+0x2a/0x150 assign_work+0x2a/0x150 worker_thread+0x1b7/0x500 ? __pfx_worker_thread+0x10/0x10 kthread+0x192/0x1d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2ac/0x3c0 ? __pfx_ret_from_fork+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? __switch_to+0x2d5/0x6e0 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 470: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 iolatency_set_limit+0x301/0x450 cgroup_file_write+0x178/0x2e0 kernfs_fop_write_iter+0x1ef/0x290 vfs_write+0x446/0x6f0 ksys_write+0xc7/0x160 do_syscall_64+0xf9/0x540 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 611: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x43/0x70 kfree+0x131/0x390 rq_qos_exit+0x5d/0x90 __del_gendisk+0x394/0x490 del_gendisk+0xa1/0xe0 virtblk_remove+0x41/0xd0 virtio_dev_remove+0x63/0xe0 device_release_driver_internal+0x246/0x2e0 unbind_store+0xa9/0xb0 kernfs_fop_write_iter+0x1ef/0x290 vfs_write+0x446/0x6f0 ksys_write+0xc7/0x160 do_syscall_64+0xf9/0x540 entry_SYSCALL_64_after_hwframe+0x77/0x7f Last potentially related work creation: kasan_save_stack+0x33/0x60 kasan_record_aux_stack+0x8c/0xa0 __queue_work+0x42a/0x800 queue_work_on+0x5d/0x70 iolatency_set_min_lat_nsec+0x196/0x230 iolatency_pd_offline+0x1f/0x40 blkcg_deactivate_policy+0x194/0x270 blkcg_iolatency_exit+0x33/0x40 rq_qos_exit+0x5d/0x90 __del_gendisk+0x394/0x490 del_gendisk+0xa1/0xe0 virtblk_remove+0x41/0xd0 virtio_dev_remove+0x63/0xe0 device_release_driver_internal+0x246/0x2e0 unbind_store+0xa9/0xb0 kernfs_fop_write_iter+0x1ef/0x290 vfs_write+0x446/0x6f0 ksys_write+0xc7/0x160 do_syscall_64+0xf9/0x540 entry_SYSCALL_64_after_hwframe+0x77/0x7f Second to last potentially related work creation: kasan_save_stack+0x33/0x60 kasan_record_aux_stack+0x8c/0xa0 __queue_work+0x42a/0x800 queue_work_on+0x5d/0x70 iolatency_set_min_lat_nsec+0x196/0x230 iolatency_set_limit+0x3f1/0x450 cgroup_file_write+0x178/0x2e0 kernfs_fop_write_iter+0x1ef/0x290 vfs_write+0x446/0x6f0 ksys_write+0xc7/0x160 do_syscall_64+0xf9/0x540 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 8a177a3 ("blk-iolatency: Fix inflight count imbalances and IO hangs on offline") Assisted-by: Codex:gpt-5.5 Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
Author
|
Upstream branch: bade58e |
5aed2f4 to
e7fceb3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: blk-iolatency: flush enable work after policy deactivation
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1114347